计算机与现代化 ›› 2010, Vol. 1 ›› Issue (3): 141-3.doi: 10.3969/j.issn.1006-2475.2010.03.040

• 中文信息技术 • 上一篇    下一篇

结合语言模型的自动文本分类的应用研究

赵敏涯   

  1. 苏州市职业大学计算机工程系,江苏 苏州 215104
  • 收稿日期:2009-02-19 修回日期:1900-01-01 出版日期:2010-03-20 发布日期:2010-03-20

Application Study of Automatic Text Classification Combined with Language Model

ZHAO Min-ya   

  1. Department of Computer Engineering, Suzhou Vocation University, Suzhou 215104, China
  • Received:2009-02-19 Revised:1900-01-01 Online:2010-03-20 Published:2010-03-20

摘要:

研究统计语言模型中bigram模型在自动文本分类中的应用,针对传统的向量空间模型在计算文本相似度时假设特征项相互独立的缺点,提出一种利用词对及词序信息来改善文本分类结果的方法。实验结果表明该方法是可行且有效的。

关键词: 统计语言模型, 文本分类, 平滑, bigram

Abstract:

This paper studies the application of bigram model from statistical language model in the automatic text classification. Referring to the shortcoming of the hypothesis that the terms are independent from each other in VSM (Vector Space Model), it puts forward a method to improve the result of text classification with mutual words’ information and sequence. The experiment shows that the method is feasible and efficient.

Key words: statistical language model, text classification, smoothing, bigram